Fast and accurate long-range phasing and imputation in a UK Biobank cohort
نویسندگان
چکیده
Recent work has leveraged the unique genealogical structure and extensive genotyping (>30%) of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here, we develop a fast and accurate LRP method, Eagle, that extends this paradigm to outbred populations by harnessing long (>4cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N=150K samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1–2 orders of magnitude faster than existing methods while achieving exquisite phasing accuracy (switch error rate ≈0.3%, corresponding to perfect phase at the scale of >10Mb). Moreover, we observed that Eagle imputed masked genotypes with accuracy R2>0.75 down to a minor allele frequency of 0.1%. Compared to computationally tractable alternatives, Eagle attained large improvements in phasing and imputation accuracy at N=150K and smaller improvements at smaller sample sizes, illustrating the advantages that LRP-based imputation will yield as very large reference panels become available.
منابع مشابه
Recursive Long Range Phasing and Long Haplotype Library Imputation: Building a Global Haplotype Library for Holstein cattle
Long range phasing (LRP) is a fast and accurate rule based method which uses information from both related and unrelated individuals by invoking the concepts of surrogate parents and Erdös numbers (Kong et al., 2008). Recursive long range phasing and long haplotype imputation (RLRPLHI; Hickey et al., 2009) is an extended LRP algorithm with increased robustness partially due to the extra long ha...
متن کاملIdentity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models
Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identica...
متن کاملThe Significance of Biobanking in the Sustainability of Biomedical Research: A Review
Biobank, defined as a functional unit for facilitating and improving research by storing biospecimen and their accompanying data, is a key resource for advancement in life science. The history of biobanking goes back to the time of archiving pathology samples. Nowadays, biobanks have considerably improved and are classified into two categories: diseased-oriented and population-based biobanks. U...
متن کاملEvaluation and application of summary statistic imputation to discover new height-associated loci
As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-incr...
متن کامل